416 research outputs found

    Sparse Partially Collapsed MCMC for Parallel Inference in Topic Models

    Full text link
    Topic models, and more specifically the class of Latent Dirichlet Allocation (LDA), are widely used for probabilistic modeling of text. MCMC sampling from the posterior distribution is typically performed using a collapsed Gibbs sampler. We propose a parallel sparse partially collapsed Gibbs sampler and compare its speed and efficiency to state-of-the-art samplers for topic models on five well-known text corpora of differing sizes and properties. In particular, we propose and compare two different strategies for sampling the parameter block with latent topic indicators. The experiments show that the increase in statistical inefficiency from only partial collapsing is smaller than commonly assumed, and can be more than compensated by the speedup from parallelization and sparsity on larger corpora. We also prove that the partially collapsed samplers scale well with the size of the corpus. The proposed algorithm is fast, efficient, exact, and can be used in more modeling situations than the ordinary collapsed sampler.Comment: Accepted for publication in Journal of Computational and Graphical Statistic

    Making Sense of the Census: Classifying and Counting Ethnicity in Oceania, 1965-2011

    Get PDF
    As the flagship government effort to count and classify its population, censuses are a key site for rendering and making visible group boundaries. Despite claims to objective rationality, however, census taking is a political and inherently subjective exercise. Censuses help shape the very categories they claim to capture: censuses do more than reflect social reality, they also participate in the social construction of this reality (Kertzer and Arel, 2002b, p. 2). While ethnicity – as a social construct – is imagined, its effects are far from imaginary, and census categorisations may have significant material consequences for the lives of citizens. Although an increasing number of studies have examined how and why governments in particular times or places count their populations by ethnicity, studies that are both cross-national and longitudinal are rare. Attempting to in part bridge this gap, this thesis studies census questionnaires from 1965 to 2011 for 24 countries in Oceania. In doing so, it explores three general questions: 1) how ethnicity is conceptualised and categorised in Oceanic censuses over time; 2) the relationship between ethnic counting in territories to that of their metropoles; and 3) Oceanic approaches towards multiple ethnic identities. Spread over an area of thirty million square kilometres of the Pacific Ocean, Oceania provides an interesting context to study ethnic counting. The countries and territories which make up the region present an enormous diversity in physical geography and culture, languages and social organization, size and resource endowment. As the last region in the world to decolonise, Oceania includes a mix of dependencies and sovereign states. The study finds that engagement with ethnic classification and counting is near-ubiquitous across the time period, with most countries having done so in all five cross-sectional census rounds. In general terms, in ethnic census questions ‘racial’ terminology of race and ancestry has been displaced over the focal period by ‘ethnic’ terminology of ethnicity and ethnic origin. Overall, the concept of ethnic origins predominates, although interestingly it is paired with race in the US territories, reflecting the ongoing social and political salience of race in the metropole. With respect to ethnic categories provided on census forms (and thus imbued with the legitimacy of explicit state recognition) the study finds a shift away from the imagined and flawed Melanesian/Micronesian/ Polynesian racial typology and other colonial impositions to more localised and self-identified Pacific identities. It is theorised that these shifts are emblematic of broader global changes in the impetuses for ethnic counting, from colonially-influenced ‘top down’ counting serving exclusionary ends to more inclusive, ‘bottom up’ approaches motivated by concerns for minority rights and inclusive policy-making

    Delayed Sampling and Automatic Rao-Blackwellization of Probabilistic Programs

    Full text link
    We introduce a dynamic mechanism for the solution of analytically-tractable substructure in probabilistic programs, using conjugate priors and affine transformations to reduce variance in Monte Carlo estimators. For inference with Sequential Monte Carlo, this automatically yields improvements such as locally-optimal proposals and Rao-Blackwellization. The mechanism maintains a directed graph alongside the running program that evolves dynamically as operations are triggered upon it. Nodes of the graph represent random variables, edges the analytically-tractable relationships between them. Random variables remain in the graph for as long as possible, to be sampled only when they are used by the program in a way that cannot be resolved analytically. In the meantime, they are conditioned on as many observations as possible. We demonstrate the mechanism with a few pedagogical examples, as well as a linear-nonlinear state-space model with simulated data, and an epidemiological model with real data of a dengue outbreak in Micronesia. In all cases one or more variables are automatically marginalized out to significantly reduce variance in estimates of the marginal likelihood, in the final case facilitating a random-weight or pseudo-marginal-type importance sampler for parameter estimation. We have implemented the approach in Anglican and a new probabilistic programming language called Birch.Comment: 13 pages, 4 figure

    Real-Time Probabilistic Programming

    Full text link
    Complex cyber-physical systems interact in real-time and must consider both timing and uncertainty. Developing software for such systems is both expensive and difficult, especially when modeling, inference, and real-time behavior need to be developed from scratch. Recently, a new kind of language has emerged -- called probabilistic programming languages (PPLs) -- that simplify modeling and inference by separating the concerns between probabilistic modeling and inference algorithm implementation. However, these languages have primarily been designed for offline problems, not online real-time systems. In this paper, we combine PPLs and real-time programming primitives by introducing the concept of real-time probabilistic programming languages (RTPPL). We develop an RTPPL called ProbTime and demonstrate its usability on an automotive testbed performing indoor positioning and braking. Moreover, we study fundamental properties and design alternatives for runtime behavior, including a new fairness-guided approach that automatically optimizes the accuracy of a ProbTime system under schedulability constraints

    Automatic Alignment in Higher-Order Probabilistic Programming Languages

    Full text link
    Probabilistic Programming Languages (PPLs) allow users to encode statistical inference problems and automatically apply an inference algorithm to solve them. Popular inference algorithms for PPLs, such as sequential Monte Carlo (SMC) and Markov chain Monte Carlo (MCMC), are built around checkpoints -- relevant events for the inference algorithm during the execution of a probabilistic program. Deciding the location of checkpoints is, in current PPLs, not done optimally. To solve this problem, we present a static analysis technique that automatically determines checkpoints in programs, relieving PPL users of this task. The analysis identifies a set of checkpoints that execute in the same order in every program run -- they are aligned. We formalize alignment, prove the correctness of the analysis, and implement the analysis as part of the higher-order functional PPL Miking CorePPL. By utilizing the alignment analysis, we design two novel inference algorithm variants: aligned SMC and aligned lightweight MCMC. We show, through real-world experiments, that they significantly improve inference execution time and accuracy compared to standard PPL versions of SMC and MCMC

    A comparison of metacompilation approaches to implementing Modelica

    Get PDF

    Bíborsügérek (Hemichromis guttatus Günther, 1862) a Hévízi-tó termálvizében = Jewel cichlids (Hemichromis guttatus Günther, 1862) in thermal water of Lake Hévíz (Western Hungary)

    Get PDF
    Abstract—We contend that repeatability of execution times is crucial to the validity of testing of real-time systems. However, computer architecture designs fail to deliver repeatable timing, a consequence of aggressive techniques that improve averagecase performance. This paper introduces the Precision-Timed ARM (PTARM), a precision-timed (PRET) microarchitecture implementation that exhibits repeatable execution times without sacrificing performance. The PTARM employs a repeatable thread-interleaved pipeline with an exposed memory hierarchy, including a repeatable DRAM controller. Our benchmarks show an improved throughput compared to a single-threaded in-order five-stage pipeline, given sufficient parallelism in the software. I
    corecore